Dataset statistics
| Number of variables | 28 |
|---|---|
| Number of observations | 4726 |
| Missing cells | 4345 |
| Missing cells (%) | 3.3% |
| Duplicate rows | 1 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 1.0 MiB |
| Average record size in memory | 224.0 B |
Variable types
| Numeric | 19 |
|---|---|
| Text | 4 |
| Categorical | 5 |
| Dataset has 1 (< 0.1%) duplicate rows | Duplicates |
Source is highly imbalanced (51.3%) | Imbalance |
MPAA Max is highly imbalanced (57.6%) | Imbalance |
British Words has 4282 (90.6%) missing values | Missing |
British WC has 4280 (90.6%) zeros | Zeros |
SMOG Readability has 53 (1.1%) zeros | Zeros |
Reproduction
| Analysis started | 2023-12-30 05:48:35.794541 |
|---|---|
| Analysis finished | 2023-12-30 05:49:01.356516 |
| Duration | 25.56 seconds |
| Software version | ydata-profiling vv4.6.3 |
| Download configuration | config.json |
ID
Real number (ℝ)
| Distinct | 4724 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4366.3472 |
| Minimum | 400 |
|---|---|
| Maximum | 8031 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 400 |
|---|---|
| 5-th percentile | 1215.15 |
| Q1 | 2769.75 |
| median | 4483.5 |
| Q3 | 5939.25 |
| 95-th percentile | 7239.7 |
| Maximum | 8031 |
| Range | 7631 |
| Interquartile range (IQR) | 3169.5 |
Descriptive statistics
| Standard deviation | 1896.3637 |
|---|---|
| Coefficient of variation (CV) | 0.43431354 |
| Kurtosis | -1.0449349 |
| Mean | 4366.3472 |
| Median Absolute Deviation (MAD) | 1606.5 |
| Skewness | -0.092433569 |
| Sum | 20626624 |
| Variance | 3596195.3 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 5461 | 1 | < 0.1% |
| 5478 | 1 | < 0.1% |
| 5477 | 1 | < 0.1% |
| 5476 | 1 | < 0.1% |
| 5475 | 1 | < 0.1% |
| 5474 | 1 | < 0.1% |
| 5473 | 1 | < 0.1% |
| 5472 | 1 | < 0.1% |
| 5471 | 1 | < 0.1% |
| 5470 | 1 | < 0.1% |
| Other values (4714) | 4714 | |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 400 | 1 | |
| 401 | 1 | |
| 402 | 1 | |
| 403 | 1 | |
| 404 | 1 | |
| 405 | 1 | |
| 406 | 1 | |
| 407 | 1 | |
| 408 | 1 | |
| 409 | 1 |
| Value | Count | Frequency (%) |
| 8031 | 1 | |
| 8030 | 1 | |
| 8029 | 1 | |
| 8028 | 1 | |
| 8027 | 1 | |
| 8026 | 1 | |
| 8025 | 1 | |
| 8024 | 1 | |
| 8023 | 1 | |
| 8022 | 1 |
Author
Text
| Distinct | 2409 |
|---|---|
| Distinct (%) | 51.0% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
Length
| Max length | 267 |
|---|---|
| Median length | 167 |
| Mean length | 18.849069 |
| Min length | 1 |
Characters and Unicode
| Total characters | 89043 |
|---|---|
| Distinct characters | 100 |
| Distinct categories | 11 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 3 ? |
Unique
| Unique | 1947 ? |
|---|---|
| Unique (%) | 41.2% |
Sample
| 1st row | Carolyn Wells |
|---|---|
| 2nd row | Carolyn Wells |
| 3rd row | Carolyn Wells |
| 4th row | CHARLES KINGSLEY |
| 5th row | Charles Kingsley |
| Value | Count | Frequency (%) |
| 448 | 3.1% | |
| wiki | 276 | 1.9% |
| simple | 275 | 1.9% |
| wikipedia | 274 | 1.9% |
| a | 191 | 1.3% |
| m | 168 | 1.2% |
| and | 148 | 1.0% |
| e | 141 | 1.0% |
| by | 141 | 1.0% |
| h | 141 | 1.0% |
| Other values (4315) | 12282 |
Most occurring characters
| Value | Count | Frequency (%) |
| 9566 | 10.7% | |
| e | 6928 | 7.8% |
| a | 6484 | 7.3% |
| i | 5786 | 6.5% |
| r | 4787 | 5.4% |
| n | 4430 | 5.0% |
| o | 3827 | 4.3% |
| l | 3530 | 4.0% |
| s | 3011 | 3.4% |
| t | 2996 | 3.4% |
| Other values (90) | 37698 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 58083 | |
| Uppercase Letter | 16926 | 19.0% |
| Space Separator | 9566 | 10.7% |
| Other Punctuation | 3592 | 4.0% |
| Control | 387 | 0.4% |
| Decimal Number | 256 | 0.3% |
| Dash Punctuation | 139 | 0.2% |
| Open Punctuation | 31 | < 0.1% |
| Close Punctuation | 31 | < 0.1% |
| Math Symbol | 24 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 6928 | |
| a | 6484 | |
| i | 5786 | |
| r | 4787 | 8.2% |
| n | 4430 | 7.6% |
| o | 3827 | 6.6% |
| l | 3530 | 6.1% |
| s | 3011 | 5.2% |
| t | 2996 | 5.2% |
| d | 2077 | 3.6% |
| Other values (34) | 14227 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 1453 | 8.6% |
| M | 1268 | 7.5% |
| S | 1246 | 7.4% |
| E | 1197 | 7.1% |
| C | 962 | 5.7% |
| H | 938 | 5.5% |
| R | 934 | 5.5% |
| L | 918 | 5.4% |
| B | 823 | 4.9% |
| T | 743 | 4.4% |
| Other values (19) | 6444 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 102 | |
| 2 | 38 | 14.8% |
| 9 | 34 | 13.3% |
| 4 | 33 | 12.9% |
| 5 | 11 | 4.3% |
| 3 | 9 | 3.5% |
| 6 | 8 | 3.1% |
| 8 | 7 | 2.7% |
| 0 | 7 | 2.7% |
| 7 | 7 | 2.7% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 2299 | |
| , | 715 | 19.9% |
| ? | 299 | 8.3% |
| & | 175 | 4.9% |
| ; | 48 | 1.3% |
| ' | 34 | 0.9% |
| " | 18 | 0.5% |
| : | 4 | 0.1% |
Math Symbol
| Value | Count | Frequency (%) |
| > | 8 | |
| + | 8 | |
| < | 8 |
Space Separator
| Value | Count | Frequency (%) |
| 9566 |
Control
| Value | Count | Frequency (%) |
| 387 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 139 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 31 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 31 |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 75009 | |
| Common | 14034 | 15.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 6928 | 9.2% |
| a | 6484 | 8.6% |
| i | 5786 | 7.7% |
| r | 4787 | 6.4% |
| n | 4430 | 5.9% |
| o | 3827 | 5.1% |
| l | 3530 | 4.7% |
| s | 3011 | 4.0% |
| t | 2996 | 4.0% |
| d | 2077 | 2.8% |
| Other values (63) | 31153 |
Common
| Value | Count | Frequency (%) |
| 9566 | ||
| . | 2299 | 16.4% |
| , | 715 | 5.1% |
| 387 | 2.8% | |
| ? | 299 | 2.1% |
| & | 175 | 1.2% |
| - | 139 | 1.0% |
| 1 | 102 | 0.7% |
| ; | 48 | 0.3% |
| 2 | 38 | 0.3% |
| Other values (17) | 266 | 1.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 88881 | |
| None | 154 | 0.2% |
| Punctuation | 8 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 9566 | 10.8% | |
| e | 6928 | 7.8% |
| a | 6484 | 7.3% |
| i | 5786 | 6.5% |
| r | 4787 | 5.4% |
| n | 4430 | 5.0% |
| o | 3827 | 4.3% |
| l | 3530 | 4.0% |
| s | 3011 | 3.4% |
| t | 2996 | 3.4% |
| Other values (68) | 37536 |
None
| Value | Count | Frequency (%) |
| é | 55 | |
| í | 15 | 9.7% |
| á | 13 | 8.4% |
| ö | 12 | 7.8% |
| ñ | 8 | 5.2% |
| ä | 8 | 5.2% |
| ó | 7 | 4.5% |
| ü | 6 | 3.9% |
| è | 5 | 3.2% |
| ë | 4 | 2.6% |
| Other values (11) | 21 | 13.6% |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 |
Title
Text
| Distinct | 4658 |
|---|---|
| Distinct (%) | 98.6% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
Length
| Max length | 189 |
|---|---|
| Median length | 103 |
| Mean length | 27.384208 |
| Min length | 1 |
Characters and Unicode
| Total characters | 129363 |
|---|---|
| Distinct characters | 101 |
| Distinct categories | 13 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 3 ? |
Unique
| Unique | 4612 ? |
|---|---|
| Unique (%) | 97.6% |
Sample
| 1st row | Patty's Suitors |
|---|---|
| 2nd row | Two Little Women on a Holiday |
| 3rd row | Patty Blossom |
| 4th row | THE WATER-BABIES A Fairy Tale for a Land-Baby |
| 5th row | HOW THE ARGONAUTS WERE DRIVEN INTO THE UNKNOWN SEA |
| Value | Count | Frequency (%) |
| the | 2483 | 11.4% |
| of | 1006 | 4.6% |
| and | 653 | 3.0% |
| a | 549 | 2.5% |
| in | 332 | 1.5% |
| to | 243 | 1.1% |
| how | 195 | 0.9% |
| on | 159 | 0.7% |
| story | 146 | 0.7% |
| for | 129 | 0.6% |
| Other values (6670) | 15920 |
Most occurring characters
| Value | Count | Frequency (%) |
| 16822 | 13.0% | |
| e | 9274 | 7.2% |
| o | 6148 | 4.8% |
| a | 5781 | 4.5% |
| n | 5463 | 4.2% |
| t | 5350 | 4.1% |
| i | 5315 | 4.1% |
| r | 5300 | 4.1% |
| T | 4179 | 3.2% |
| s | 4155 | 3.2% |
| Other values (91) | 61576 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 68795 | |
| Uppercase Letter | 40432 | |
| Space Separator | 16822 | 13.0% |
| Other Punctuation | 1836 | 1.4% |
| Dash Punctuation | 359 | 0.3% |
| Control | 353 | 0.3% |
| Connector Punctuation | 309 | 0.2% |
| Decimal Number | 220 | 0.2% |
| Final Punctuation | 114 | 0.1% |
| Close Punctuation | 42 | < 0.1% |
| Other values (3) | 81 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 9274 | |
| o | 6148 | 8.9% |
| a | 5781 | 8.4% |
| n | 5463 | 7.9% |
| t | 5350 | 7.8% |
| i | 5315 | 7.7% |
| r | 5300 | 7.7% |
| s | 4155 | 6.0% |
| h | 3552 | 5.2% |
| l | 3040 | 4.4% |
| Other values (21) | 15417 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 4179 | 10.3% |
| E | 3695 | 9.1% |
| A | 3233 | 8.0% |
| S | 2887 | 7.1% |
| I | 2356 | 5.8% |
| O | 2349 | 5.8% |
| R | 2342 | 5.8% |
| H | 2240 | 5.5% |
| N | 2163 | 5.3% |
| C | 1729 | 4.3% |
| Other values (20) | 13259 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 467 | |
| . | 364 | |
| , | 295 | |
| : | 275 | |
| ? | 236 | |
| " | 90 | 4.9% |
| ! | 63 | 3.4% |
| ; | 25 | 1.4% |
| & | 8 | 0.4% |
| / | 5 | 0.3% |
| Other values (2) | 8 | 0.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 65 | |
| 2 | 32 | |
| 0 | 24 | 10.9% |
| 9 | 20 | 9.1% |
| 3 | 20 | 9.1% |
| 4 | 17 | 7.7% |
| 8 | 15 | 6.8% |
| 7 | 13 | 5.9% |
| 5 | 7 | 3.2% |
| 6 | 7 | 3.2% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 3 | |
| < | 2 | |
| > | 2 | |
| × | 1 | 12.5% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 313 | |
| — | 40 | 11.1% |
| – | 6 | 1.7% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 84 | |
| ” | 30 | 26.3% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 39 | |
| ] | 3 | 7.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 39 | |
| [ | 3 | 7.1% |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 30 | |
| ‘ | 1 | 3.2% |
Space Separator
| Value | Count | Frequency (%) |
| 16822 |
Control
| Value | Count | Frequency (%) |
| 353 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 309 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 109227 | |
| Common | 20136 | 15.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 9274 | 8.5% |
| o | 6148 | 5.6% |
| a | 5781 | 5.3% |
| n | 5463 | 5.0% |
| t | 5350 | 4.9% |
| i | 5315 | 4.9% |
| r | 5300 | 4.9% |
| T | 4179 | 3.8% |
| s | 4155 | 3.8% |
| E | 3695 | 3.4% |
| Other values (51) | 54567 |
Common
| Value | Count | Frequency (%) |
| 16822 | ||
| ' | 467 | 2.3% |
| . | 364 | 1.8% |
| 353 | 1.8% | |
| - | 313 | 1.6% |
| _ | 309 | 1.5% |
| , | 295 | 1.5% |
| : | 275 | 1.4% |
| ? | 236 | 1.2% |
| " | 90 | 0.4% |
| Other values (30) | 612 | 3.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 129147 | |
| Punctuation | 194 | 0.1% |
| None | 22 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 16822 | 13.0% | |
| e | 9274 | 7.2% |
| o | 6148 | 4.8% |
| a | 5781 | 4.5% |
| n | 5463 | 4.2% |
| t | 5350 | 4.1% |
| i | 5315 | 4.1% |
| r | 5300 | 4.1% |
| T | 4179 | 3.2% |
| s | 4155 | 3.2% |
| Other values (74) | 61360 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 84 | |
| — | 40 | |
| ” | 30 | 15.5% |
| “ | 30 | 15.5% |
| – | 6 | 3.1% |
| … | 3 | 1.5% |
| ‘ | 1 | 0.5% |
None
| Value | Count | Frequency (%) |
| æ | 5 | |
| Æ | 5 | |
| é | 4 | |
| Ö | 2 | 9.1% |
| É | 1 | 4.5% |
| Î | 1 | 4.5% |
| ü | 1 | 4.5% |
| × | 1 | 4.5% |
| ö | 1 | 4.5% |
| à | 1 | 4.5% |
Source
Categorical
IMBALANCE 
| Distinct | 19 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
| gutenberg | |
|---|---|
| kids.frontiersin | |
| commonlit | |
| simple.wikipedia | 275 |
| wikipedia | 274 |
| Other values (14) |
Length
| Max length | 18 |
|---|---|
| Median length | 9 |
| Mean length | 10.768628 |
| Min length | 4 |
Characters and Unicode
| Total characters | 50871 |
|---|---|
| Distinct characters | 28 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | gutenberg |
|---|---|
| 2nd row | gutenberg |
| 3rd row | gutenberg |
| 4th row | gutenberg |
| 5th row | gutenberg |
Common Values
| Value | Count | Frequency (%) |
| gutenberg | 2916 | |
| kids.frontiersin | 458 | 9.7% |
| commonlit | 296 | 6.3% |
| simple.wikipedia | 275 | 5.8% |
| wikipedia | 274 | 5.8% |
| africanstorybook | 250 | 5.3% |
| online-literature | 95 | 2.0% |
| digitallibrary | 61 | 1.3% |
| freekidsbooks | 50 | 1.1% |
| static.ehe.osu.edu | 16 | 0.3% |
| Other values (9) | 33 | 0.7% |
Length
| Value | Count | Frequency (%) |
| gutenberg | 2916 | |
| kids.frontiersin | 458 | 9.7% |
| commonlit | 296 | 6.3% |
| simple.wikipedia | 275 | 5.8% |
| wikipedia | 274 | 5.8% |
| africanstorybook | 250 | 5.3% |
| online-literature | 95 | 2.0% |
| digitallibrary | 61 | 1.3% |
| freekidsbooks | 50 | 1.1% |
| static.ehe.osu.edu | 16 | 0.3% |
| Other values (9) | 33 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 7573 | |
| g | 5906 | |
| r | 4703 | |
| n | 4583 | |
| i | 4318 | |
| t | 4204 | |
| b | 3294 | 6.5% |
| u | 3057 | 6.0% |
| o | 2067 | 4.1% |
| s | 1590 | 3.1% |
| Other values (18) | 9576 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 49963 | |
| Other Punctuation | 805 | 1.6% |
| Dash Punctuation | 95 | 0.2% |
| Decimal Number | 8 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 7573 | |
| g | 5906 | |
| r | 4703 | |
| n | 4583 | |
| i | 4318 | |
| t | 4204 | |
| b | 3294 | 6.6% |
| u | 3057 | 6.1% |
| o | 2067 | 4.1% |
| s | 1590 | 3.2% |
| Other values (14) | 8668 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 4 | |
| 2 | 4 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 805 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 95 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 49963 | |
| Common | 908 | 1.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 7573 | |
| g | 5906 | |
| r | 4703 | |
| n | 4583 | |
| i | 4318 | |
| t | 4204 | |
| b | 3294 | 6.6% |
| u | 3057 | 6.1% |
| o | 2067 | 4.1% |
| s | 1590 | 3.2% |
| Other values (14) | 8668 |
Common
| Value | Count | Frequency (%) |
| . | 805 | |
| - | 95 | 10.5% |
| 1 | 4 | 0.4% |
| 2 | 4 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 50871 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 7573 | |
| g | 5906 | |
| r | 4703 | |
| n | 4583 | |
| i | 4318 | |
| t | 4204 | |
| b | 3294 | 6.5% |
| u | 3057 | 6.0% |
| o | 2067 | 4.1% |
| s | 1590 | 3.1% |
| Other values (18) | 9576 |
Pub Year
Real number (ℝ)
| Distinct | 168 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 11 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1937.887 |
| Minimum | 1728 |
|---|---|
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 1728 |
|---|---|
| 5-th percentile | 1867 |
| Q1 | 1884 |
| median | 1915 |
| Q3 | 2016 |
| 95-th percentile | 2020 |
| Maximum | 2020 |
| Range | 292 |
| Interquartile range (IQR) | 132 |
Descriptive statistics
| Standard deviation | 60.506795 |
|---|---|
| Coefficient of variation (CV) | 0.031223078 |
| Kurtosis | -1.4246343 |
| Mean | 1937.887 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 0.35347988 |
| Sum | 9137137 |
| Variance | 3661.0723 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2020 | 592 | 12.5% |
| 2019 | 241 | 5.1% |
| 2017 | 167 | 3.5% |
| 1915 | 160 | 3.4% |
| 1881 | 153 | 3.2% |
| 2018 | 151 | 3.2% |
| 1883 | 148 | 3.1% |
| 1882 | 138 | 2.9% |
| 1922 | 128 | 2.7% |
| 1914 | 120 | 2.5% |
| Other values (158) | 2717 |
| Value | Count | Frequency (%) |
| 1728 | 1 | < 0.1% |
| 1761 | 1 | < 0.1% |
| 1781 | 1 | < 0.1% |
| 1789 | 1 | < 0.1% |
| 1791 | 3 | |
| 1792 | 1 | < 0.1% |
| 1811 | 1 | < 0.1% |
| 1812 | 1 | < 0.1% |
| 1813 | 2 | |
| 1814 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2020 | 592 | |
| 2019 | 241 | |
| 2018 | 151 | 3.2% |
| 2017 | 167 | 3.5% |
| 2016 | 111 | 2.3% |
| 2015 | 95 | 2.0% |
| 2014 | 112 | 2.4% |
| 2013 | 39 | 0.8% |
| 2012 | 10 | 0.2% |
| 2011 | 7 | 0.1% |
Category
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
| Lit | |
|---|---|
| Info |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.4877223 |
| Min length | 3 |
Characters and Unicode
| Total characters | 16476 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Lit |
|---|---|
| 2nd row | Lit |
| 3rd row | Lit |
| 4th row | Lit |
| 5th row | Lit |
Common Values
| Value | Count | Frequency (%) |
| Lit | 2420 | |
| Info | 2304 | |
| (Missing) | 2 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| lit | 2420 | |
| info | 2304 |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 2420 | |
| i | 2420 | |
| t | 2420 | |
| I | 2304 | |
| n | 2304 | |
| f | 2304 | |
| o | 2304 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 11752 | |
| Uppercase Letter | 4724 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 2420 | |
| t | 2420 | |
| n | 2304 | |
| f | 2304 | |
| o | 2304 |
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 2420 | |
| I | 2304 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16476 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| L | 2420 | |
| i | 2420 | |
| t | 2420 | |
| I | 2304 | |
| n | 2304 | |
| f | 2304 | |
| o | 2304 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16476 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| L | 2420 | |
| i | 2420 | |
| t | 2420 | |
| I | 2304 | |
| n | 2304 | |
| f | 2304 | |
| o | 2304 |
Location
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
| mid | |
|---|---|
| start | |
| whole | 122 |
| end | 108 |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.485182 |
| Min length | 3 |
Characters and Unicode
| Total characters | 16464 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | mid |
|---|---|
| 2nd row | mid |
| 3rd row | mid |
| 4th row | mid |
| 5th row | mid |
Common Values
| Value | Count | Frequency (%) |
| mid | 3470 | |
| start | 1024 | 21.7% |
| whole | 122 | 2.6% |
| end | 108 | 2.3% |
| (Missing) | 2 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| mid | 3470 | |
| start | 1024 | 21.7% |
| whole | 122 | 2.6% |
| end | 108 | 2.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| d | 3578 | |
| m | 3470 | |
| i | 3470 | |
| t | 2048 | |
| s | 1024 | 6.2% |
| a | 1024 | 6.2% |
| r | 1024 | 6.2% |
| e | 230 | 1.4% |
| w | 122 | 0.7% |
| h | 122 | 0.7% |
| Other values (3) | 352 | 2.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 16464 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| d | 3578 | |
| m | 3470 | |
| i | 3470 | |
| t | 2048 | |
| s | 1024 | 6.2% |
| a | 1024 | 6.2% |
| r | 1024 | 6.2% |
| e | 230 | 1.4% |
| w | 122 | 0.7% |
| h | 122 | 0.7% |
| Other values (3) | 352 | 2.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16464 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| d | 3578 | |
| m | 3470 | |
| i | 3470 | |
| t | 2048 | |
| s | 1024 | 6.2% |
| a | 1024 | 6.2% |
| r | 1024 | 6.2% |
| e | 230 | 1.4% |
| w | 122 | 0.7% |
| h | 122 | 0.7% |
| Other values (3) | 352 | 2.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16464 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| d | 3578 | |
| m | 3470 | |
| i | 3470 | |
| t | 2048 | |
| s | 1024 | 6.2% |
| a | 1024 | 6.2% |
| r | 1024 | 6.2% |
| e | 230 | 1.4% |
| w | 122 | 0.7% |
| h | 122 | 0.7% |
| Other values (3) | 352 | 2.1% |
MPAA Max
Categorical
IMBALANCE 
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
| G | |
|---|---|
| PG | |
| PG-13 | 87 |
| R | 3 |
Length
| Max length | 5 |
|---|---|
| Median length | 1 |
| Mean length | 1.2701101 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6000 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | G |
|---|---|
| 2nd row | PG |
| 3rd row | PG |
| 4th row | PG-13 |
| 5th row | PG |
Common Values
| Value | Count | Frequency (%) |
| G | 3706 | |
| PG | 928 | 19.6% |
| PG-13 | 87 | 1.8% |
| R | 3 | 0.1% |
| (Missing) | 2 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| g | 3706 | |
| pg | 928 | 19.6% |
| pg-13 | 87 | 1.8% |
| r | 3 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 4721 | |
| P | 1015 | 16.9% |
| - | 87 | 1.5% |
| 1 | 87 | 1.5% |
| 3 | 87 | 1.5% |
| R | 3 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 5739 | |
| Decimal Number | 174 | 2.9% |
| Dash Punctuation | 87 | 1.5% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| G | 4721 | |
| P | 1015 | 17.7% |
| R | 3 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 87 | |
| 3 | 87 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 87 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5739 | |
| Common | 261 | 4.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| G | 4721 | |
| P | 1015 | 17.7% |
| R | 3 | 0.1% |
Common
| Value | Count | Frequency (%) |
| - | 87 | |
| 1 | 87 | |
| 3 | 87 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| G | 4721 | |
| P | 1015 | 16.9% |
| - | 87 | 1.5% |
| 1 | 87 | 1.5% |
| 3 | 87 | 1.5% |
| R | 3 | < 0.1% |
Excerpt
Text
| Distinct | 4723 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
Length
| Max length | 1341 |
|---|---|
| Median length | 1112 |
| Mean length | 972.78345 |
| Min length | 667 |
Characters and Unicode
| Total characters | 4595429 |
|---|---|
| Distinct characters | 133 |
| Distinct categories | 18 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 3 ? |
Unique
| Unique | 4722 ? |
|---|---|
| Unique (%) | > 99.9% |
Sample
| 1st row | When the young people returned to the ballroom, it presented a decidedly changed appearance. Instead of an interior scene, it was a winter landscape. The floor was covered with snow-white canvas, not laid on smoothly, but rumpled over bumps and hillocks, like a real snow field. The numerous palms and evergreens that had decorated the room, were powdered with flour and strewn with tufts of cotton, like snow. Also diamond dust had been lightly sprinkled on them, and glittering crystal icicles hung from the branches. At each end of the room, on the wall, hung a beautiful bear-skin rug. These rugs were for prizes, one for the girls and one for the boys. And this was the game. The girls were gathered at one end of the room and the boys at the other, and one end was called the North Pole, and the other the South Pole. Each player was given a small flag which they were to plant on reaching the Pole. This would have been an easy matter, but each traveller was obliged to wear snowshoes. |
|---|---|
| 2nd row | All through dinner time, Mrs. Fayre was somewhat silent, her eyes resting on Dolly with a wistful, uncertain expression. She wanted to give the child the pleasure she craved, but she had hard work to bring herself to the point of overcoming her own objections. At last, however, when the meal was nearly over, she smiled at her little daughter, and said, "All right, Dolly, you may go." "Oh, mother!" Dolly cried, overwhelmed with sudden delight. "Really? Oh, I am so glad! Are you sure you're willing?" "I've persuaded myself to be willing, against my will," returned Mrs. Fayre, whimsically. "I confess I just hate to have you go, but I can't bear to deprive you of the pleasure trip. And, as you say, it would also keep Dotty at home, and so, altogether, I think I shall have to give in." "Oh, you angel mother! You blessed lady! How good you are!" And Dolly flew around the table and gave her mother a hug that nearly suffocated her. |
| 3rd row | As Roger had predicted, the snow departed as quickly as it came, and two days after their sleigh ride there was scarcely a vestige of white on the ground. Tennis was again possible and a great game was in progress on the court at Pine Laurel. Patty and Roger were playing against Elise and Sam Blaney, and the pairs were well matched. But the long-contested victory finally went against Patty, and she laughingly accepted defeat. "Only because Patty's not quite back on her game yet," Roger defended; "this child has been on the sick list, you know, Sam, and she isn't up to her own mark." "Well, I like that!" cried Patty; "suppose you bear half the blame, Roger. You see, Mr. Blaney, he is so absorbed in his own Love Game, he can't play with his old-time skill." "All right, Patsy, let it go at that. And it's so, too. I suddenly remembered something Mona told me to tell you, and it affected my service." |
| 4th row | Mr. Grimes was to come up next morning to Sir John Harthover's, at the Place, for his old chimney-sweep was gone to prison, and the chimneys wanted sweeping. And so he rode away, not giving Tom time to ask what the sweep had gone to prison for, which was a matter of interest to Tom, as he had been in prison once or twice himself. Moreover, the groom looked so very neat and clean, with his drab gaiters, drab breeches, drab jacket, snow-white tie with a smart pin in it, and clean round ruddy face, that Tom was offended and disgusted at his appearance, and considered him a stuck-up fellow, who gave himself airs because he wore smart clothes, and other people paid for them; and went behind the wall to fetch the half-brick after all; but did not, remembering that he had come in the way of business, and was, as it were, under a flag of truce. |
| 5th row | And outside before the palace a great garden was walled round, filled full of stately fruit-trees, gray olives and sweet figs, and pomegranates, pears, and apples, which bore the whole year round. For the rich south-west wind fed them, till pear grew ripe on pear, fig on fig, and grape on grape, all the winter and the spring. And at the farther end gay flower-beds bloomed through all seasons of the year; and two fair fountains rose, and ran, one through the garden grounds, and one beneath the palace gate, to water all the town. Such noble gifts the heavens had given to Alcinous the wise. So they went in, and saw him sitting, like Poseidon, on his throne, with his golden sceptre by him, in garments stiff with gold, and in his hand a sculptured goblet, as he pledged the merchant kings; and beside him stood Arete, his wise and lovely queen, and leaned against a pillar as she spun her golden threads. |
| Value | Count | Frequency (%) |
| the | 56066 | 6.8% |
| and | 28015 | 3.4% |
| of | 25157 | 3.1% |
| to | 21457 | 2.6% |
| a | 20137 | 2.5% |
| in | 15331 | 1.9% |
| was | 9619 | 1.2% |
| that | 8857 | 1.1% |
| is | 8825 | 1.1% |
| it | 8437 | 1.0% |
| Other values (40071) | 616942 |
Most occurring characters
| Value | Count | Frequency (%) |
| 807580 | ||
| e | 457138 | 9.9% |
| t | 324208 | 7.1% |
| a | 293591 | 6.4% |
| o | 271463 | 5.9% |
| n | 247803 | 5.4% |
| i | 234603 | 5.1% |
| s | 229706 | 5.0% |
| r | 217184 | 4.7% |
| h | 212700 | 4.6% |
| Other values (123) | 1299453 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3549288 | |
| Space Separator | 807580 | 17.6% |
| Other Punctuation | 125119 | 2.7% |
| Uppercase Letter | 85116 | 1.9% |
| Decimal Number | 9873 | 0.2% |
| Dash Punctuation | 7356 | 0.2% |
| Control | 7290 | 0.2% |
| Close Punctuation | 1681 | < 0.1% |
| Open Punctuation | 1681 | < 0.1% |
| Initial Punctuation | 135 | < 0.1% |
| Other values (8) | 310 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 457138 | |
| t | 324208 | 9.1% |
| a | 293591 | 8.3% |
| o | 271463 | 7.6% |
| n | 247803 | 7.0% |
| i | 234603 | 6.6% |
| s | 229706 | 6.5% |
| r | 217184 | 6.1% |
| h | 212700 | 6.0% |
| l | 149711 | 4.2% |
| Other values (38) | 911181 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 12969 | |
| I | 10272 | |
| A | 7247 | 8.5% |
| S | 6221 | 7.3% |
| H | 4951 | 5.8% |
| W | 4381 | 5.1% |
| B | 4341 | 5.1% |
| M | 4250 | 5.0% |
| C | 3668 | 4.3% |
| E | 2816 | 3.3% |
| Other values (19) | 24000 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 55380 | |
| . | 43132 | |
| " | 11634 | 9.3% |
| ' | 5571 | 4.5% |
| ; | 4121 | 3.3% |
| ! | 2214 | 1.8% |
| ? | 1699 | 1.4% |
| : | 1083 | 0.9% |
| % | 106 | 0.1% |
| / | 106 | 0.1% |
| Other values (6) | 73 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2370 | |
| 1 | 2047 | |
| 2 | 1025 | |
| 9 | 746 | 7.6% |
| 5 | 721 | 7.3% |
| 8 | 700 | 7.1% |
| 3 | 667 | 6.8% |
| 4 | 560 | 5.7% |
| 6 | 527 | 5.3% |
| 7 | 510 | 5.2% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 17 | |
| = | 8 | |
| ~ | 7 | |
| × | 7 | |
| < | 3 | 6.7% |
| ± | 2 | 4.4% |
| ÷ | 1 | 2.2% |
Other Number
| Value | Count | Frequency (%) |
| ½ | 34 | |
| ¼ | 6 | 13.0% |
| ¹ | 4 | 8.7% |
| ¾ | 2 | 4.3% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5689 | |
| — | 1495 | 20.3% |
| – | 172 | 2.3% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1671 | |
| ] | 10 | 0.6% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1671 | |
| [ | 10 | 0.6% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‘ | 129 | |
| “ | 6 | 4.4% |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 72 | |
| £ | 11 | 13.3% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 43 | |
| ” | 6 | 12.2% |
Space Separator
| Value | Count | Frequency (%) |
| 807580 |
Control
| Value | Count | Frequency (%) |
| 7290 |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 76 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 9 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 1 |
Format
| Value | Count | Frequency (%) |
| | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3634401 | |
| Common | 961028 | 20.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 457138 | |
| t | 324208 | 8.9% |
| a | 293591 | 8.1% |
| o | 271463 | 7.5% |
| n | 247803 | 6.8% |
| i | 234603 | 6.5% |
| s | 229706 | 6.3% |
| r | 217184 | 6.0% |
| h | 212700 | 5.9% |
| l | 149711 | 4.1% |
| Other values (66) | 996294 |
Common
| Value | Count | Frequency (%) |
| 807580 | ||
| , | 55380 | 5.8% |
| . | 43132 | 4.5% |
| " | 11634 | 1.2% |
| 7290 | 0.8% | |
| - | 5689 | 0.6% |
| ' | 5571 | 0.6% |
| ; | 4121 | 0.4% |
| 0 | 2370 | 0.2% |
| ! | 2214 | 0.2% |
| Other values (47) | 16047 | 1.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4593106 | |
| Punctuation | 1897 | < 0.1% |
| None | 426 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 807580 | ||
| e | 457138 | 10.0% |
| t | 324208 | 7.1% |
| a | 293591 | 6.4% |
| o | 271463 | 5.9% |
| n | 247803 | 5.4% |
| i | 234603 | 5.1% |
| s | 229706 | 5.0% |
| r | 217184 | 4.7% |
| h | 212700 | 4.6% |
| Other values (78) | 1297130 |
Punctuation
| Value | Count | Frequency (%) |
| — | 1495 | |
| – | 172 | 9.1% |
| ‘ | 129 | 6.8% |
| ’ | 43 | 2.3% |
| … | 31 | 1.6% |
| • | 15 | 0.8% |
| “ | 6 | 0.3% |
| ” | 6 | 0.3% |
None
| Value | Count | Frequency (%) |
| é | 78 | |
| ° | 76 | |
| æ | 37 | 8.7% |
| ½ | 34 | 8.0% |
| ö | 22 | 5.2% |
| á | 20 | 4.7% |
| Æ | 14 | 3.3% |
| è | 14 | 3.3% |
| œ | 11 | 2.6% |
| £ | 11 | 2.6% |
| Other values (27) | 109 |
Google WC
Real number (ℝ)
| Distinct | 76 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 171.9602 |
| Minimum | 125 |
|---|---|
| Maximum | 205 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 125 |
|---|---|
| 5-th percentile | 143 |
| Q1 | 158 |
| median | 174 |
| Q3 | 186 |
| 95-th percentile | 197 |
| Maximum | 205 |
| Range | 80 |
| Interquartile range (IQR) | 28 |
Descriptive statistics
| Standard deviation | 16.988921 |
|---|---|
| Coefficient of variation (CV) | 0.098795656 |
| Kurtosis | -0.99875435 |
| Mean | 171.9602 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | -0.25087257 |
| Sum | 812340 |
| Variance | 288.62344 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 179 | 107 | 2.3% |
| 197 | 105 | 2.2% |
| 189 | 103 | 2.2% |
| 161 | 102 | 2.2% |
| 177 | 101 | 2.1% |
| 185 | 99 | 2.1% |
| 191 | 99 | 2.1% |
| 175 | 98 | 2.1% |
| 192 | 97 | 2.1% |
| 182 | 97 | 2.1% |
| Other values (66) | 3716 |
| Value | Count | Frequency (%) |
| 125 | 1 | < 0.1% |
| 126 | 1 | < 0.1% |
| 129 | 1 | < 0.1% |
| 132 | 1 | < 0.1% |
| 133 | 1 | < 0.1% |
| 134 | 3 | 0.1% |
| 135 | 5 | 0.1% |
| 136 | 5 | 0.1% |
| 137 | 12 | |
| 138 | 16 |
| Value | Count | Frequency (%) |
| 205 | 2 | < 0.1% |
| 203 | 3 | 0.1% |
| 202 | 7 | 0.1% |
| 201 | 11 | 0.2% |
| 200 | 25 | 0.5% |
| 199 | 52 | |
| 198 | 61 | |
| 197 | 105 | |
| 196 | 70 | |
| 195 | 75 |
Joon WC v1
Real number (ℝ)
| Distinct | 86 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 176.92549 |
| Minimum | 135 |
|---|---|
| Maximum | 220 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 135 |
|---|---|
| 5-th percentile | 146 |
| Q1 | 162 |
| median | 178 |
| Q3 | 191 |
| 95-th percentile | 204 |
| Maximum | 220 |
| Range | 85 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 18.173592 |
|---|---|
| Coefficient of variation (CV) | 0.1027189 |
| Kurtosis | -0.86274809 |
| Mean | 176.92549 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | -0.13168323 |
| Sum | 835796 |
| Variance | 330.27943 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 193 | 100 | 2.1% |
| 177 | 100 | 2.1% |
| 190 | 97 | 2.1% |
| 184 | 97 | 2.1% |
| 198 | 97 | 2.1% |
| 188 | 96 | 2.0% |
| 189 | 96 | 2.0% |
| 196 | 95 | 2.0% |
| 183 | 93 | 2.0% |
| 185 | 93 | 2.0% |
| Other values (76) | 3760 |
| Value | Count | Frequency (%) |
| 135 | 3 | 0.1% |
| 136 | 3 | 0.1% |
| 137 | 1 | < 0.1% |
| 138 | 3 | 0.1% |
| 139 | 1 | < 0.1% |
| 140 | 24 | |
| 141 | 24 | |
| 142 | 32 | |
| 143 | 34 | |
| 144 | 34 |
| Value | Count | Frequency (%) |
| 220 | 5 | 0.1% |
| 219 | 5 | 0.1% |
| 218 | 5 | 0.1% |
| 217 | 6 | |
| 216 | 8 | |
| 215 | 8 | |
| 214 | 6 | |
| 213 | 14 | |
| 212 | 12 | |
| 211 | 13 |
British WC
Real number (ℝ)
ZEROS 
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.12933954 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 4280 |
| Zeros (%) | 90.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.47104911 |
|---|---|
| Coefficient of variation (CV) | 3.6419574 |
| Kurtosis | 51.392443 |
| Mean | 0.12933954 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.6099155 |
| Sum | 611 |
| Variance | 0.22188726 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4280 | |
| 1 | 330 | 7.0% |
| 2 | 78 | 1.7% |
| 3 | 29 | 0.6% |
| 5 | 3 | 0.1% |
| 4 | 2 | < 0.1% |
| 9 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 4280 | |
| 1 | 330 | 7.0% |
| 2 | 78 | 1.7% |
| 3 | 29 | 0.6% |
| 4 | 2 | < 0.1% |
| 5 | 3 | 0.1% |
| 6 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 5 | 3 | 0.1% |
| 4 | 2 | < 0.1% |
| 3 | 29 | 0.6% |
| 2 | 78 | 1.7% |
| 1 | 330 | 7.0% |
| 0 | 4280 |
British Words
Text
MISSING 
| Distinct | 250 |
|---|---|
| Distinct (%) | 56.3% |
| Missing | 4282 |
| Missing (%) | 90.6% |
| Memory size | 37.0 KiB |
Length
| Max length | 65 |
|---|---|
| Median length | 35 |
| Mean length | 11.31982 |
| Min length | 3 |
Characters and Unicode
| Total characters | 5026 |
|---|---|
| Distinct characters | 27 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 173 ? |
|---|---|
| Unique (%) | 39.0% |
Sample
| 1st row | traveller |
|---|---|
| 2nd row | sceptre |
| 3rd row | grey |
| 4th row | aeroplane |
| 5th row | axe |
| Value | Count | Frequency (%) |
| grey | 34 | 5.6% |
| travelled | 28 | 4.6% |
| colour | 20 | 3.3% |
| metres | 18 | 2.9% |
| centre | 15 | 2.5% |
| axe | 11 | 1.8% |
| travelling | 11 | 1.8% |
| theatre | 11 | 1.8% |
| mould | 10 | 1.6% |
| kilometres | 10 | 1.6% |
| Other values (186) | 443 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 644 | |
| r | 515 | 10.2% |
| l | 410 | 8.2% |
| o | 367 | 7.3% |
| a | 324 | 6.4% |
| u | 292 | 5.8% |
| s | 273 | 5.4% |
| t | 237 | 4.7% |
| i | 234 | 4.7% |
| n | 189 | 3.8% |
| Other values (17) | 1541 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4692 | |
| Other Punctuation | 167 | 3.3% |
| Space Separator | 167 | 3.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 644 | |
| r | 515 | |
| l | 410 | 8.7% |
| o | 367 | 7.8% |
| a | 324 | 6.9% |
| u | 292 | 6.2% |
| s | 273 | 5.8% |
| t | 237 | 5.1% |
| i | 234 | 5.0% |
| n | 189 | 4.0% |
| Other values (15) | 1207 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 167 |
Space Separator
| Value | Count | Frequency (%) |
| 167 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4692 | |
| Common | 334 | 6.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 644 | |
| r | 515 | |
| l | 410 | 8.7% |
| o | 367 | 7.8% |
| a | 324 | 6.9% |
| u | 292 | 6.2% |
| s | 273 | 5.8% |
| t | 237 | 5.1% |
| i | 234 | 5.0% |
| n | 189 | 4.0% |
| Other values (15) | 1207 |
Common
| Value | Count | Frequency (%) |
| , | 167 | |
| 167 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5026 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 644 | |
| r | 515 | 10.2% |
| l | 410 | 8.2% |
| o | 367 | 7.3% |
| a | 324 | 6.4% |
| u | 292 | 5.8% |
| s | 273 | 5.4% |
| t | 237 | 4.7% |
| i | 234 | 4.7% |
| n | 189 | 3.8% |
| Other values (17) | 1541 |
Sentence Count v1
Real number (ℝ)
| Distinct | 38 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.5707028 |
| Minimum | 2 |
|---|---|
| Maximum | 41 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 7 |
| median | 8 |
| Q3 | 11 |
| 95-th percentile | 19 |
| Maximum | 41 |
| Range | 39 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 4.6401616 |
|---|---|
| Coefficient of variation (CV) | 0.48482977 |
| Kurtosis | 5.2743914 |
| Mean | 9.5707028 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.8963455 |
| Sum | 45212 |
| Variance | 21.5311 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8 | 676 | |
| 7 | 651 | |
| 6 | 563 | |
| 9 | 501 | |
| 10 | 405 | |
| 5 | 338 | |
| 11 | 301 | 6.4% |
| 12 | 232 | 4.9% |
| 13 | 155 | 3.3% |
| 4 | 135 | 2.9% |
| Other values (28) | 767 |
| Value | Count | Frequency (%) |
| 2 | 19 | 0.4% |
| 3 | 56 | 1.2% |
| 4 | 135 | 2.9% |
| 5 | 338 | |
| 6 | 563 | |
| 7 | 651 | |
| 8 | 676 | |
| 9 | 501 | |
| 10 | 405 | |
| 11 | 301 |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 39 | 1 | < 0.1% |
| 38 | 2 | < 0.1% |
| 37 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 35 | 3 | 0.1% |
| 33 | 2 | < 0.1% |
| 32 | 2 | < 0.1% |
| 31 | 8 | |
| 30 | 7 |
Sentence Count v2
Real number (ℝ)
| Distinct | 39 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.7523285 |
| Minimum | 2 |
|---|---|
| Maximum | 41 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 7 |
| median | 9 |
| Q3 | 11 |
| 95-th percentile | 19 |
| Maximum | 41 |
| Range | 39 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 4.6813387 |
|---|---|
| Coefficient of variation (CV) | 0.48002266 |
| Kurtosis | 5.2278219 |
| Mean | 9.7523285 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.8978978 |
| Sum | 46070 |
| Variance | 21.914932 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8 | 663 | |
| 7 | 646 | |
| 9 | 535 | |
| 6 | 517 | |
| 10 | 412 | |
| 11 | 325 | |
| 5 | 302 | 6.4% |
| 12 | 244 | 5.2% |
| 13 | 168 | 3.6% |
| 14 | 142 | 3.0% |
| Other values (29) | 770 |
| Value | Count | Frequency (%) |
| 2 | 12 | 0.3% |
| 3 | 45 | 1.0% |
| 4 | 135 | 2.9% |
| 5 | 302 | |
| 6 | 517 | |
| 7 | 646 | |
| 8 | 663 | |
| 9 | 535 | |
| 10 | 412 | |
| 11 | 325 |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 38 | 1 | < 0.1% |
| 37 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 35 | 2 | < 0.1% |
| 34 | 4 | |
| 33 | 4 | |
| 32 | 3 | |
| 31 | 5 |
Paragraphs
Real number (ℝ)
| Distinct | 18 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.542337 |
| Minimum | 1 |
|---|---|
| Maximum | 20 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 6 |
| Maximum | 20 |
| Range | 19 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.8662981 |
|---|---|
| Coefficient of variation (CV) | 0.7340876 |
| Kurtosis | 8.8809902 |
| Mean | 2.542337 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 2.2850584 |
| Sum | 12010 |
| Variance | 3.4830685 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1644 | |
| 2 | 1263 | |
| 3 | 812 | |
| 4 | 435 | 9.2% |
| 5 | 253 | 5.4% |
| 6 | 128 | 2.7% |
| 7 | 81 | 1.7% |
| 8 | 44 | 0.9% |
| 10 | 19 | 0.4% |
| 9 | 19 | 0.4% |
| Other values (8) | 26 | 0.6% |
| Value | Count | Frequency (%) |
| 1 | 1644 | |
| 2 | 1263 | |
| 3 | 812 | |
| 4 | 435 | 9.2% |
| 5 | 253 | 5.4% |
| 6 | 128 | 2.7% |
| 7 | 81 | 1.7% |
| 8 | 44 | 0.9% |
| 9 | 19 | 0.4% |
| 10 | 19 | 0.4% |
| Value | Count | Frequency (%) |
| 20 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 16 | 2 | < 0.1% |
| 15 | 4 | 0.1% |
| 14 | 5 | 0.1% |
| 13 | 1 | < 0.1% |
| 12 | 7 | 0.1% |
| 11 | 5 | 0.1% |
| 10 | 19 | |
| 9 | 19 |
BT Easiness
Real number (ℝ)
| Distinct | 4724 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.95763863 |
| Minimum | -3.6762678 |
|---|---|
| Maximum | 1.7113898 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 3831 |
| Negative (%) | 81.1% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -3.6762678 |
|---|---|
| 5-th percentile | -2.7037429 |
| Q1 | -1.6965546 |
| median | -0.90909418 |
| Q3 | -0.20342801 |
| 95-th percentile | 0.68065368 |
| Maximum | 1.7113898 |
| Range | 5.3876576 |
| Interquartile range (IQR) | 1.4931266 |
Descriptive statistics
| Standard deviation | 1.0336564 |
|---|---|
| Coefficient of variation (CV) | -1.0793804 |
| Kurtosis | -0.48537058 |
| Mean | -0.95763863 |
| Median Absolute Deviation (MAD) | 0.7480769 |
| Skewness | -0.13451472 |
| Sum | -4523.8849 |
| Variance | 1.0684455 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -0.583532619 | 1 | < 0.1% |
| -1.572730112 | 1 | < 0.1% |
| -3.431114154 | 1 | < 0.1% |
| -0.755286774 | 1 | < 0.1% |
| -0.803262973 | 1 | < 0.1% |
| -0.519230627 | 1 | < 0.1% |
| -2.141945369 | 1 | < 0.1% |
| -0.081595481 | 1 | < 0.1% |
| -0.357133172 | 1 | < 0.1% |
| -0.437143526 | 1 | < 0.1% |
| Other values (4714) | 4714 | |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| -3.676267773 | 1 | |
| -3.66836041 | 1 | |
| -3.64289216 | 1 | |
| -3.639935554 | 1 | |
| -3.636833783 | 1 | |
| -3.596750775 | 1 | |
| -3.591318724 | 1 | |
| -3.590328227 | 1 | |
| -3.585369303 | 1 | |
| -3.549190203 | 1 |
| Value | Count | Frequency (%) |
| 1.711389827 | 1 | |
| 1.658697523 | 1 | |
| 1.597869841 | 1 | |
| 1.583846826 | 1 | |
| 1.58010057 | 1 | |
| 1.546966393 | 1 | |
| 1.541671879 | 1 | |
| 1.467665465 | 1 | |
| 1.465592368 | 1 | |
| 1.465054812 | 1 |
BT s.e.
Real number (ℝ)
| Distinct | 4724 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.49121615 |
| Minimum | 0 |
|---|---|
| Maximum | 0.6496713 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.45083726 |
| Q1 | 0.46866285 |
| median | 0.48445233 |
| Q3 | 0.50614283 |
| 95-th percentile | 0.55662515 |
| Maximum | 0.6496713 |
| Range | 0.6496713 |
| Interquartile range (IQR) | 0.037479982 |
Descriptive statistics
| Standard deviation | 0.033998652 |
|---|---|
| Coefficient of variation (CV) | 0.069213221 |
| Kurtosis | 11.758801 |
| Mean | 0.49121615 |
| Median Absolute Deviation (MAD) | 0.018108844 |
| Skewness | 0.71895997 |
| Sum | 2320.5051 |
| Variance | 0.0011559083 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.449156065 | 1 | < 0.1% |
| 0.502195987 | 1 | < 0.1% |
| 0.600151746 | 1 | < 0.1% |
| 0.484693766 | 1 | < 0.1% |
| 0.46455111 | 1 | < 0.1% |
| 0.479176332 | 1 | < 0.1% |
| 0.526451773 | 1 | < 0.1% |
| 0.507193133 | 1 | < 0.1% |
| 0.481407983 | 1 | < 0.1% |
| 0.462594537 | 1 | < 0.1% |
| Other values (4714) | 4714 | |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 0.427220021 | 1 | |
| 0.428232657 | 1 | |
| 0.430425066 | 1 | |
| 0.43129656 | 1 | |
| 0.431815319 | 1 | |
| 0.433000257 | 1 | |
| 0.433103135 | 1 | |
| 0.43370786 | 1 | |
| 0.434138091 | 1 |
| Value | Count | Frequency (%) |
| 0.649671297 | 1 | |
| 0.649028675 | 1 | |
| 0.648732745 | 1 | |
| 0.648481117 | 1 | |
| 0.648473916 | 1 | |
| 0.648174341 | 1 | |
| 0.64783414 | 1 | |
| 0.646942357 | 1 | |
| 0.646906876 | 1 | |
| 0.646899678 | 1 |
Flesch-Reading-Ease
Real number (ℝ)
| Distinct | 3282 |
|---|---|
| Distinct (%) | 69.5% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 65.23123 |
| Minimum | -28.99 |
|---|---|
| Maximum | 114.03 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 7 |
| Negative (%) | 0.1% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -28.99 |
|---|---|
| 5-th percentile | 33.906 |
| Q1 | 53.6275 |
| median | 66.33 |
| Q3 | 78.65 |
| 95-th percentile | 92.4085 |
| Maximum | 114.03 |
| Range | 143.02 |
| Interquartile range (IQR) | 25.0225 |
Descriptive statistics
| Standard deviation | 18.178085 |
|---|---|
| Coefficient of variation (CV) | 0.27867151 |
| Kurtosis | 0.47533703 |
| Mean | 65.23123 |
| Median Absolute Deviation (MAD) | 12.53 |
| Skewness | -0.50510332 |
| Sum | 308152.33 |
| Variance | 330.44279 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 73.2 | 8 | 0.2% |
| 61.33 | 7 | 0.1% |
| 55.88 | 7 | 0.1% |
| 75.43 | 6 | 0.1% |
| 71.72 | 6 | 0.1% |
| 57.61 | 6 | 0.1% |
| 64.6 | 6 | 0.1% |
| 58.03 | 6 | 0.1% |
| 61.11 | 6 | 0.1% |
| 75.69 | 6 | 0.1% |
| Other values (3272) | 4660 |
| Value | Count | Frequency (%) |
| -28.99 | 1 | |
| -25.84 | 1 | |
| -21.33 | 1 | |
| -14.79 | 1 | |
| -8.59 | 1 | |
| -4.93 | 1 | |
| -3.28 | 1 | |
| 0.61 | 1 | |
| 2.69 | 1 | |
| 3.05 | 1 |
| Value | Count | Frequency (%) |
| 114.03 | 1 | |
| 112.52 | 1 | |
| 111.11 | 1 | |
| 109.82 | 1 | |
| 108.74 | 1 | |
| 108.35 | 1 | |
| 107.38 | 1 | |
| 106.67 | 1 | |
| 106.23 | 1 | |
| 106.19 | 1 |
Flesch-Kincaid-Grade-Level
Real number (ℝ)
| Distinct | 1593 |
|---|---|
| Distinct (%) | 33.7% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.508315 |
| Minimum | -1.04 |
|---|---|
| Maximum | 42.64 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 7 |
| Negative (%) | 0.1% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -1.04 |
|---|---|
| 5-th percentile | 3.0715 |
| Q1 | 6.56 |
| median | 9.35 |
| Q3 | 11.9725 |
| 95-th percentile | 16.6385 |
| Maximum | 42.64 |
| Range | 43.68 |
| Interquartile range (IQR) | 5.4125 |
Descriptive statistics
| Standard deviation | 4.3259184 |
|---|---|
| Coefficient of variation (CV) | 0.45496162 |
| Kurtosis | 3.5597635 |
| Mean | 9.508315 |
| Median Absolute Deviation (MAD) | 2.705 |
| Skewness | 0.95472209 |
| Sum | 44917.28 |
| Variance | 18.71357 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10.14 | 12 | 0.3% |
| 7.5 | 12 | 0.3% |
| 11.81 | 11 | 0.2% |
| 9.26 | 10 | 0.2% |
| 11.52 | 10 | 0.2% |
| 8.08 | 10 | 0.2% |
| 10.37 | 10 | 0.2% |
| 10.96 | 10 | 0.2% |
| 11.39 | 9 | 0.2% |
| 12.4 | 9 | 0.2% |
| Other values (1583) | 4621 |
| Value | Count | Frequency (%) |
| -1.04 | 1 | |
| -1.02 | 1 | |
| -0.28 | 1 | |
| -0.15 | 1 | |
| -0.08 | 2 | |
| -0.06 | 1 | |
| 0.04 | 1 | |
| 0.09 | 1 | |
| 0.2 | 1 | |
| 0.22 | 1 |
| Value | Count | Frequency (%) |
| 42.64 | 1 | |
| 41.33 | 1 | |
| 39.29 | 1 | |
| 36.2 | 1 | |
| 35.45 | 1 | |
| 33.33 | 1 | |
| 32.83 | 1 | |
| 31.71 | 1 | |
| 30.65 | 1 | |
| 30.56 | 1 |
Automated Readability Index
Real number (ℝ)
| Distinct | 1789 |
|---|---|
| Distinct (%) | 37.9% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.26463 |
| Minimum | -3.09 |
|---|---|
| Maximum | 51.59 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 37 |
| Negative (%) | 0.8% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -3.09 |
|---|---|
| 5-th percentile | 2.5315 |
| Q1 | 6.82 |
| median | 10.05 |
| Q3 | 13.21 |
| 95-th percentile | 18.6955 |
| Maximum | 51.59 |
| Range | 54.68 |
| Interquartile range (IQR) | 6.39 |
Descriptive statistics
| Standard deviation | 5.2401554 |
|---|---|
| Coefficient of variation (CV) | 0.51050604 |
| Kurtosis | 4.4785757 |
| Mean | 10.26463 |
| Median Absolute Deviation (MAD) | 3.19 |
| Skewness | 1.0743285 |
| Sum | 48490.11 |
| Variance | 27.459229 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10.43 | 10 | 0.2% |
| 12.91 | 10 | 0.2% |
| 13.79 | 10 | 0.2% |
| 9.06 | 9 | 0.2% |
| 11.32 | 9 | 0.2% |
| 12.14 | 9 | 0.2% |
| 9.81 | 9 | 0.2% |
| 13.07 | 9 | 0.2% |
| 11.77 | 9 | 0.2% |
| 12.47 | 9 | 0.2% |
| Other values (1779) | 4631 |
| Value | Count | Frequency (%) |
| -3.09 | 1 | |
| -2.81 | 1 | |
| -2 | 1 | |
| -1.66 | 1 | |
| -1.54 | 1 | |
| -1.53 | 1 | |
| -1.15 | 1 | |
| -1.05 | 1 | |
| -0.95 | 1 | |
| -0.9 | 1 |
| Value | Count | Frequency (%) |
| 51.59 | 1 | |
| 50.12 | 1 | |
| 48.71 | 1 | |
| 43.35 | 1 | |
| 42.87 | 1 | |
| 42.15 | 1 | |
| 39.89 | 1 | |
| 39.15 | 1 | |
| 37.19 | 1 | |
| 36.78 | 1 |
SMOG Readability
Real number (ℝ)
ZEROS 
| Distinct | 29 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.198582 |
| Minimum | 0 |
|---|---|
| Maximum | 18 |
| Zeros | 53 |
| Zeros (%) | 1.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 8 |
| median | 10 |
| Q3 | 12 |
| 95-th percentile | 16 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.263997 |
|---|---|
| Coefficient of variation (CV) | 0.3200442 |
| Kurtosis | 0.14203313 |
| Mean | 10.198582 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.0084267015 |
| Sum | 48178.1 |
| Variance | 10.653676 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 583 | |
| 10 | 554 | |
| 12 | 522 | |
| 9 | 484 | |
| 8 | 482 | |
| 7 | 403 | |
| 13 | 326 | |
| 14 | 304 | |
| 6 | 303 | |
| 5 | 231 | 4.9% |
| Other values (19) | 532 |
| Value | Count | Frequency (%) |
| 0 | 53 | 1.1% |
| 3 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| 4.41 | 3 | 0.1% |
| 4.73 | 2 | < 0.1% |
| 5 | 231 | |
| 5.24 | 2 | < 0.1% |
| 5.45 | 3 | 0.1% |
| 5.65 | 2 | < 0.1% |
| 5.83 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 18 | 88 | 1.9% |
| 17 | 54 | 1.1% |
| 16 | 132 | 2.8% |
| 15 | 181 | 3.8% |
| 14 | 304 | |
| 13 | 326 | |
| 12 | 522 | |
| 11 | 583 | |
| 10 | 554 | |
| 9 | 484 |
New Dale-Chall Readability Formula
Real number (ℝ)
| Distinct | 816 |
|---|---|
| Distinct (%) | 17.3% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.6747841 |
| Minimum | 0.28 |
|---|---|
| Maximum | 14.19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 0.28 |
|---|---|
| 5-th percentile | 5.32 |
| Q1 | 6.5575 |
| median | 7.625 |
| Q3 | 8.87 |
| 95-th percentile | 10.8085 |
| Maximum | 14.19 |
| Range | 13.91 |
| Interquartile range (IQR) | 2.3125 |
Descriptive statistics
| Standard deviation | 1.9445503 |
|---|---|
| Coefficient of variation (CV) | 0.25336873 |
| Kurtosis | 2.1298538 |
| Mean | 7.6747841 |
| Median Absolute Deviation (MAD) | 1.155 |
| Skewness | -0.59961801 |
| Sum | 36255.68 |
| Variance | 3.7812758 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.39 | 23 | 0.5% |
| 7.89 | 19 | 0.4% |
| 7.26 | 19 | 0.4% |
| 7.36 | 19 | 0.4% |
| 7.04 | 18 | 0.4% |
| 6.98 | 18 | 0.4% |
| 7.81 | 18 | 0.4% |
| 7.65 | 17 | 0.4% |
| 6.14 | 17 | 0.4% |
| 6.91 | 17 | 0.4% |
| Other values (806) | 4539 |
| Value | Count | Frequency (%) |
| 0.28 | 1 | |
| 0.33 | 1 | |
| 0.47 | 1 | |
| 0.58 | 1 | |
| 0.59 | 1 | |
| 0.6 | 1 | |
| 0.65 | 1 | |
| 0.66 | 1 | |
| 0.68 | 1 | |
| 0.75 | 1 |
| Value | Count | Frequency (%) |
| 14.19 | 2 | |
| 13.83 | 1 | |
| 13.56 | 1 | |
| 13.35 | 1 | |
| 13.22 | 1 | |
| 13.07 | 1 | |
| 13.03 | 1 | |
| 13.02 | 1 | |
| 12.89 | 1 | |
| 12.81 | 1 |
CAREC
Real number (ℝ)
| Distinct | 4453 |
|---|---|
| Distinct (%) | 94.3% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.16534191 |
| Minimum | -0.16835 |
|---|---|
| Maximum | 0.59977 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 290 |
| Negative (%) | 6.1% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -0.16835 |
|---|---|
| 5-th percentile | -0.008843 |
| Q1 | 0.0855675 |
| median | 0.16299 |
| Q3 | 0.2409675 |
| 95-th percentile | 0.347209 |
| Maximum | 0.59977 |
| Range | 0.76812 |
| Interquartile range (IQR) | 0.1554 |
Descriptive statistics
| Standard deviation | 0.1090872 |
|---|---|
| Coefficient of variation (CV) | 0.65976737 |
| Kurtosis | -0.29789085 |
| Mean | 0.16534191 |
| Median Absolute Deviation (MAD) | 0.077515 |
| Skewness | 0.13458062 |
| Sum | 781.07519 |
| Variance | 0.011900017 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.18019 | 3 | 0.1% |
| 0.11538 | 3 | 0.1% |
| 0.17252 | 3 | 0.1% |
| 0.15075 | 3 | 0.1% |
| 0.14392 | 3 | 0.1% |
| 0.13423 | 3 | 0.1% |
| 0.15939 | 3 | 0.1% |
| 0.19166 | 3 | 0.1% |
| 0.10147 | 3 | 0.1% |
| 0.11543 | 3 | 0.1% |
| Other values (4443) | 4694 |
| Value | Count | Frequency (%) |
| -0.16835 | 1 | |
| -0.12473 | 1 | |
| -0.12448 | 1 | |
| -0.1209 | 1 | |
| -0.11874 | 1 | |
| -0.11772 | 1 | |
| -0.11604 | 1 | |
| -0.11569 | 1 | |
| -0.11552 | 1 | |
| -0.11129 | 1 |
| Value | Count | Frequency (%) |
| 0.59977 | 1 | |
| 0.51909 | 1 | |
| 0.50554 | 1 | |
| 0.50005 | 1 | |
| 0.49394 | 1 | |
| 0.4933 | 1 | |
| 0.49146 | 1 | |
| 0.4903 | 1 | |
| 0.48207 | 1 | |
| 0.48001 | 1 |
CAREC_M
Real number (ℝ)
| Distinct | 4427 |
|---|---|
| Distinct (%) | 93.7% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.16639021 |
| Minimum | -0.14208 |
|---|---|
| Maximum | 0.59485 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 239 |
| Negative (%) | 5.1% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -0.14208 |
|---|---|
| 5-th percentile | -0.000104 |
| Q1 | 0.08968 |
| median | 0.1649 |
| Q3 | 0.23889 |
| 95-th percentile | 0.3446325 |
| Maximum | 0.59485 |
| Range | 0.73693 |
| Interquartile range (IQR) | 0.14921 |
Descriptive statistics
| Standard deviation | 0.1057609 |
|---|---|
| Coefficient of variation (CV) | 0.63561976 |
| Kurtosis | -0.25096943 |
| Mean | 0.16639021 |
| Median Absolute Deviation (MAD) | 0.074775 |
| Skewness | 0.15271916 |
| Sum | 786.02735 |
| Variance | 0.011185369 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.13214 | 3 | 0.1% |
| 0.26329 | 3 | 0.1% |
| 0.1444 | 3 | 0.1% |
| 0.20688 | 3 | 0.1% |
| 0.22722 | 3 | 0.1% |
| 0.17741 | 3 | 0.1% |
| 0.05491 | 3 | 0.1% |
| 0.09992 | 3 | 0.1% |
| 0.27077 | 3 | 0.1% |
| 0.13357 | 3 | 0.1% |
| Other values (4417) | 4694 |
| Value | Count | Frequency (%) |
| -0.14208 | 1 | |
| -0.12473 | 1 | |
| -0.11201 | 1 | |
| -0.11129 | 1 | |
| -0.10533 | 1 | |
| -0.10408 | 1 | |
| -0.10297 | 1 | |
| -0.101 | 1 | |
| -0.09737 | 1 | |
| -0.09714 | 1 |
| Value | Count | Frequency (%) |
| 0.59485 | 1 | |
| 0.53635 | 1 | |
| 0.50422 | 1 | |
| 0.5034 | 1 | |
| 0.50237 | 1 | |
| 0.49807 | 1 | |
| 0.491 | 1 | |
| 0.47973 | 1 | |
| 0.47925 | 1 | |
| 0.47448 | 1 |
CARES
Real number (ℝ)
| Distinct | 4718 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.46763823 |
| Minimum | 0.1253763 |
|---|---|
| Maximum | 0.79480457 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | 0.1253763 |
|---|---|
| 5-th percentile | 0.30554659 |
| Q1 | 0.39789749 |
| median | 0.46720663 |
| Q3 | 0.5327229 |
| 95-th percentile | 0.63431625 |
| Maximum | 0.79480457 |
| Range | 0.66942827 |
| Interquartile range (IQR) | 0.1348254 |
Descriptive statistics
| Standard deviation | 0.09925348 |
|---|---|
| Coefficient of variation (CV) | 0.21224415 |
| Kurtosis | -0.13190581 |
| Mean | 0.46763823 |
| Median Absolute Deviation (MAD) | 0.067795855 |
| Skewness | 0.1277529 |
| Sum | 2209.123 |
| Variance | 0.0098512533 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.441461937 | 2 | < 0.1% |
| 0.490928066 | 2 | < 0.1% |
| 0.475496151 | 2 | < 0.1% |
| 0.455844462 | 2 | < 0.1% |
| 0.5530856 | 2 | < 0.1% |
| 0.608789761 | 2 | < 0.1% |
| 0.379458543 | 1 | < 0.1% |
| 0.562123911 | 1 | < 0.1% |
| 0.545237058 | 1 | < 0.1% |
| 0.33840951 | 1 | < 0.1% |
| Other values (4708) | 4708 | |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.125376301 | 1 | |
| 0.167028985 | 1 | |
| 0.168452719 | 1 | |
| 0.172647172 | 1 | |
| 0.179377876 | 1 | |
| 0.187101315 | 1 | |
| 0.195408368 | 1 | |
| 0.207173435 | 1 | |
| 0.208074536 | 1 | |
| 0.215594964 | 1 |
| Value | Count | Frequency (%) |
| 0.794804573 | 1 | |
| 0.783490663 | 1 | |
| 0.779409813 | 1 | |
| 0.778895954 | 1 | |
| 0.777795944 | 1 | |
| 0.777785786 | 1 | |
| 0.771802197 | 1 | |
| 0.768526867 | 1 | |
| 0.763162647 | 1 | |
| 0.761037659 | 1 |
CML2RI
Real number (ℝ)
| Distinct | 4717 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.465939 |
| Minimum | -4.3808989 |
|---|---|
| Maximum | 47.214743 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 29 |
| Negative (%) | 0.6% |
| Memory size | 37.0 KiB |
Quantile statistics
| Minimum | -4.3808989 |
|---|---|
| 5-th percentile | 4.2714313 |
| Q1 | 10.102327 |
| median | 15.054398 |
| Q3 | 20.131281 |
| 95-th percentile | 28.446988 |
| Maximum | 47.214743 |
| Range | 51.595642 |
| Interquartile range (IQR) | 10.028953 |
Descriptive statistics
| Standard deviation | 7.4307516 |
|---|---|
| Coefficient of variation (CV) | 0.48045912 |
| Kurtosis | 0.14653784 |
| Mean | 15.465939 |
| Median Absolute Deviation (MAD) | 5.0091365 |
| Skewness | 0.42743616 |
| Sum | 73061.097 |
| Variance | 55.216069 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 27.26119011 | 2 | < 0.1% |
| 16.72484566 | 2 | < 0.1% |
| 8.329344196 | 2 | < 0.1% |
| 8.641413463 | 2 | < 0.1% |
| 4.439543911 | 2 | < 0.1% |
| 22.0149707 | 2 | < 0.1% |
| 26.99212229 | 2 | < 0.1% |
| 7.384839158 | 1 | < 0.1% |
| 13.89758787 | 1 | < 0.1% |
| 5.222149765 | 1 | < 0.1% |
| Other values (4707) | 4707 | |
| (Missing) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| -4.380898885 | 1 | |
| -3.990789033 | 1 | |
| -2.157767475 | 1 | |
| -2.113132089 | 1 | |
| -2.064493855 | 1 | |
| -1.790055958 | 1 | |
| -1.741719883 | 1 | |
| -1.686997053 | 1 | |
| -1.684816828 | 1 | |
| -1.314472553 | 1 |
| Value | Count | Frequency (%) |
| 47.21474347 | 1 | |
| 47.18554673 | 1 | |
| 44.65355554 | 1 | |
| 43.7716589 | 1 | |
| 42.88870907 | 1 | |
| 42.10028985 | 1 | |
| 41.72475135 | 1 | |
| 41.18325848 | 1 | |
| 40.94372932 | 1 | |
| 40.88614425 | 1 |
Kaggle split
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Memory size | 37.0 KiB |
| Train | |
|---|---|
| Test |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.5999153 |
| Min length | 4 |
Characters and Unicode
| Total characters | 21730 |
|---|---|
| Distinct characters | 8 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Train |
|---|---|
| 2nd row | Train |
| 3rd row | Train |
| 4th row | Test |
| 5th row | Train |
Common Values
| Value | Count | Frequency (%) |
| Train | 2834 | |
| Test | 1890 | |
| (Missing) | 2 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| train | 2834 | |
| test | 1890 |
Most occurring characters
| Value | Count | Frequency (%) |
| T | 4724 | |
| r | 2834 | |
| a | 2834 | |
| i | 2834 | |
| n | 2834 | |
| e | 1890 | |
| s | 1890 | |
| t | 1890 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 17006 | |
| Uppercase Letter | 4724 | 21.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 2834 | |
| a | 2834 | |
| i | 2834 | |
| n | 2834 | |
| e | 1890 | |
| s | 1890 | |
| t | 1890 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 4724 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 21730 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| T | 4724 | |
| r | 2834 | |
| a | 2834 | |
| i | 2834 | |
| n | 2834 | |
| e | 1890 | |
| s | 1890 | |
| t | 1890 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 21730 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| T | 4724 | |
| r | 2834 | |
| a | 2834 | |
| i | 2834 | |
| n | 2834 | |
| e | 1890 | |
| s | 1890 | |
| t | 1890 |
| ID | Author | Title | Source | Pub Year | Category | Location | MPAA Max | Excerpt | Google WC | Joon WC v1 | British WC | British Words | Sentence Count v1 | Sentence Count v2 | Paragraphs | BT Easiness | BT s.e. | Flesch-Reading-Ease | Flesch-Kincaid-Grade-Level | Automated Readability Index | SMOG Readability | New Dale-Chall Readability Formula | CAREC | CAREC_M | CARES | CML2RI | Kaggle split | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 400.0 | Carolyn Wells | Patty's Suitors | gutenberg | 1914.0 | Lit | mid | G | When the young people returned to the ballroom, it presented a decidedly changed appearance. Instead of an interior scene, it was a winter landscape.\nThe floor was covered with snow-white canvas, not laid on smoothly, but rumpled over bumps and hillocks, like a real snow field. The numerous palms and evergreens that had decorated the room, were powdered with flour and strewn with tufts of cotton, like snow. Also diamond dust had been lightly sprinkled on them, and glittering crystal icicles hung from the branches.\nAt each end of the room, on the wall, hung a beautiful bear-skin rug.\nThese rugs were for prizes, one for the girls and one for the boys. And this was the game.\nThe girls were gathered at one end of the room and the boys at the other, and one end was called the North Pole, and the other the South Pole. Each player was given a small flag which they were to plant on reaching the Pole.\nThis would have been an easy matter, but each traveller was obliged to wear snowshoes. | 174.0 | 179.0 | 1.0 | traveller | 11.0 | 11.0 | 6.0 | -0.340259 | 0.464009 | 81.70 | 5.95 | 7.37 | 8.0 | 6.55 | 0.12102 | 0.11952 | 0.457534 | 12.097815 | Train |
| 1 | 401.0 | Carolyn Wells | Two Little Women on a Holiday | gutenberg | 1917.0 | Lit | mid | PG | All through dinner time, Mrs. Fayre was somewhat silent, her eyes resting on Dolly with a wistful, uncertain expression. She wanted to give the child the pleasure she craved, but she had hard work to bring herself to the point of overcoming her own objections.\nAt last, however, when the meal was nearly over, she smiled at her little daughter, and said, "All right, Dolly, you may go."\n"Oh, mother!" Dolly cried, overwhelmed with sudden delight. "Really?\nOh, I am so glad! Are you sure you're willing?"\n"I've persuaded myself to be willing, against my will," returned Mrs. Fayre, whimsically. "I confess I just hate to have you go, but I can't bear to deprive you of the pleasure trip. And, as you say, it would also keep Dotty at home, and so, altogether, I think I shall have to give in."\n"Oh, you angel mother! You blessed lady! How good you are!" And Dolly flew around the table and gave her mother a hug that nearly suffocated her. | 164.0 | 184.0 | 0.0 | NaN | 15.0 | 15.0 | 6.0 | -0.315372 | 0.480805 | 80.26 | 4.86 | 4.16 | 7.0 | 6.25 | 0.04921 | 0.04921 | 0.462510 | 22.550179 | Train |
| 2 | 402.0 | Carolyn Wells | Patty Blossom | gutenberg | 1917.0 | Lit | mid | PG | As Roger had predicted, the snow departed as quickly as it came, and two days after their sleigh ride there was scarcely a vestige of white on the ground. Tennis was again possible and a great game was in progress on the court at Pine Laurel. Patty and Roger were playing against Elise and Sam Blaney, and the pairs were well matched.\nBut the long-contested victory finally went against Patty, and she laughingly accepted defeat.\n"Only because Patty's not quite back on her game yet," Roger defended; "this child has been on the sick list, you know, Sam, and she isn't up to her own mark."\n"Well, I like that!" cried Patty; "suppose you bear half the blame, Roger. You see, Mr. Blaney, he is so absorbed in his own Love Game, he can't play with his old-time skill."\n"All right, Patsy, let it go at that. And it's so, too. I suddenly remembered something Mona told me to tell you, and it affected my service." | 162.0 | 180.0 | 0.0 | NaN | 11.0 | 11.0 | 5.0 | -0.580118 | 0.476676 | 79.04 | 6.03 | 5.81 | 9.0 | 7.31 | 0.10172 | 0.09724 | 0.369259 | 18.125279 | Train |
| 3 | 403.0 | CHARLES KINGSLEY | THE WATER-BABIES\nA Fairy Tale for a Land-Baby | gutenberg | 1863.0 | Lit | mid | PG-13 | Mr. Grimes was to come up next morning to Sir John Harthover's, at the Place, for his old chimney-sweep was gone to prison, and the chimneys wanted sweeping. And so he rode away, not giving Tom time to ask what the sweep had gone to prison for, which was a matter of interest to Tom, as he had been in prison once or twice himself. Moreover, the groom looked so very neat and clean, with his drab gaiters, drab breeches, drab jacket, snow-white tie with a smart pin in it, and clean round ruddy face, that Tom was offended and disgusted at his appearance, and considered him a stuck-up fellow, who gave himself airs because he wore smart clothes, and other people paid for them; and went behind the wall to fetch the half-brick after all; but did not, remembering that he had come in the way of business, and was, as it were, under a flag of truce. | 159.0 | 160.0 | 0.0 | NaN | 3.0 | 3.0 | 1.0 | -1.785965 | 0.526599 | 44.77 | 20.51 | 24.87 | 12.0 | 8.56 | 0.07491 | 0.08856 | 0.390759 | 10.959460 | Test |
| 4 | 404.0 | Charles Kingsley | HOW THE ARGONAUTS WERE DRIVEN INTO THE UNKNOWN SEA | gutenberg | 1889.0 | Lit | mid | PG | And outside before the palace a great garden was walled round, filled full of stately fruit-trees, gray olives and sweet figs, and pomegranates, pears, and apples, which bore the whole year round. For the rich south-west wind fed them, till pear grew ripe on pear, fig on fig, and grape on grape, all the winter and the spring. And at the farther end gay flower-beds bloomed through all seasons of the year; and two fair fountains rose, and ran, one through the garden grounds, and one beneath the palace gate, to water all the town. Such noble gifts the heavens had given to Alcinous the wise.\nSo they went in, and saw him sitting, like Poseidon, on his throne, with his golden sceptre by him, in garments stiff with gold, and in his hand a sculptured goblet, as he pledged the merchant kings; and beside him stood Arete, his wise and lovely queen, and leaned against a pillar as she spun her golden threads. | 163.0 | 164.0 | 1.0 | sceptre | 5.0 | 5.0 | 2.0 | -1.054013 | 0.450007 | 68.07 | 12.06 | 15.47 | 8.0 | 7.00 | 0.06356 | 0.08798 | 0.389226 | 3.195960 | Train |
| 5 | 405.0 | Charles Madison Curry\nErle Elsworth Clippinger | The Three Little Bears | gutenberg | 1920.0 | Lit | mid | G | Once upon a time there were Three Bears who lived together in a house of their own in a wood. One of them was a Little, Small, Wee Bear; and one was a Middle-sized Bear, and the other was a Great, Huge Bear. They had each a pot for their porridge; a little pot for the Little, Small, Wee Bear; and a middle-sized pot for the Middle Bear; and a great pot for the Great, Huge Bear. And they had each a chair to sit in; a little chair for the Little, Small, Wee Bear; and a middle-sized chair for the Middle Bear; and a great chair for the Great, Huge Bear. And they had each a bed to sleep in; a little bed for the Little, Small, Wee Bear; and a middle-sized bed for the Middle Bear; and a great bed for the Great, Huge Bear. | 147.0 | 147.0 | 0.0 | NaN | 5.0 | 5.0 | 1.0 | 0.247197 | 0.510845 | 80.94 | 9.47 | 10.76 | 5.0 | 1.71 | 0.35370 | 0.36885 | 0.301666 | 28.990105 | Train |
| 6 | 406.0 | Clair W. Hayes | The Boy Allies On the Firing Line\nOr, Twelve Days Battle Along the Marne | gutenberg | 1915.0 | Lit | mid | PG | Hal and Chester found ample time to take an inventory of the general's car. It was a huge machine, and besides being fitted up luxuriously was also furnished as an office, that the general might still be at work while he hurried from one part of the field to another when events demanded his immediate presence. Even now, with treachery threatening, and whirling along at a terrific speed, General Joffre, probably because of habit, fell to work sorting papers, studying maps and other drawings.\nFor almost two hours the car whirled along at top speed, and at length pulled up in the rear of an immense body of troops, who, even to Hal and Chester, could be seen preparing for an advance. General Joffre was out of the car before it came to a full stop, and Hal and Chester were at his heels. An orderly approached.\n"My respects to General Tromp, and tell him I desire his presence immediately," ordered General Joffre. | 161.0 | 166.0 | 0.0 | NaN | 7.0 | 7.0 | 3.0 | -0.861809 | 0.480936 | 59.67 | 10.72 | 11.45 | 11.0 | 8.09 | 0.15617 | 0.16523 | 0.419842 | 12.766583 | Train |
| 7 | 407.0 | Clair W. Hayes | The Boy Allies in Great Peril | gutenberg | 1916.0 | Lit | mid | PG-13 | Hal Paine and Chester Crawford were typical American boys. With the former's mother, they had been in Berlin when the great European conflagration broke out and had been stranded there. Mrs. Paine had been able to get out of the country, but Hal and Chester were left behind.\nIn company with Major Raoul Derevaux, a Frenchman, and Captain Harry Anderson, an Englishman, they finally made their way into Belgium, where they arrived in time to take part in the heroic defense of Liége in the early stages of the war. Here they rendered such invaluable service to the Belgian commander that they were commissioned lieutenants in the little army of King Albert.\nBoth in fighting and in scouting they had proven their worth. Following the first Belgian campaign, the two lads had seen service with the British troops on the continent, where they were attached to the staff of General Sir John French, in command of the English forces. Also they had won the respect and admiration of General Joffre, the French commander-in-chief. | 171.0 | 174.0 | 0.0 | NaN | 8.0 | 8.0 | 3.0 | -1.759061 | 0.476507 | 60.87 | 10.20 | 11.88 | 12.0 | 9.23 | 0.19484 | 0.18656 | 0.484475 | 14.130141 | Train |
| 8 | 408.0 | Clair W. Hayes | The Boy Allies At Verdun | gutenberg | 1917.0 | Lit | start | PG | On the twenty-second of February, 1916, an automobile sped northward along the French battle line that for almost two years had held back the armies of the German emperor, strive as they would to win their way farther into the heart of France. For months the opposing forces had battled to a draw from the North Sea to the boundary of Switzerland, until now, as the day waned—it was almost six o'clock—the hands of time drew closer and closer to the hour that was to mark the opening of the most bitter and destructive battle of the war, up to this time.\nIt was the eve of the battle of Verdun.\nThe occupants of the automobile as it sped northward numbered three. In the front seat, alone at the driver's wheel, a young man bent low. He was garbed in the uniform of a British lieutenant of cavalry. Close inspection would have revealed the fact that the young man was a youth of some eighteen years, fair and good to look upon. | 170.0 | 173.0 | 0.0 | NaN | 7.0 | 8.0 | 3.0 | -0.952325 | 0.498116 | 68.79 | 9.80 | 11.03 | 11.0 | 7.51 | 0.11652 | 0.12905 | 0.430107 | 10.216473 | Train |
| 9 | 409.0 | Claude A. Labelle | The Ranger Boys and the Border Smugglers | gutenberg | 1922.0 | Lit | mid | PG | The boys left the capitol and made their way down the long hill to the main business part of the town. As they struck onto the main business street, Garry noticed the familiar blue bell sign of the telephone company.\n"Say, boys, I have an idea. Let's stop in here and put in long distance calls and say hello to our folks. How does the idea strike you?" said Garry, almost in one breath.\n"Ripping," shouted Phil, while Dick didn't wait to make any remark, but dived in through the door, and in a trice was putting in his call. Phil followed suit, while Garry waited, as he would talk when Dick had finished.\nThis pleasant duty done, they went to a restaurant for dinner. Here they attracted no little attention, for their khaki clothes looked almost like uniforms. Added to this was the fact that they wore forest shoepacks, those high laced moccasins with an extra leather sole, and felt campaign hats. | 160.0 | 169.0 | 0.0 | NaN | 11.0 | 10.0 | 4.0 | -0.371641 | 0.463710 | 79.22 | 6.26 | 7.33 | 9.0 | 6.96 | 0.07015 | 0.07326 | 0.377270 | 16.497078 | Train |
| ID | Author | Title | Source | Pub Year | Category | Location | MPAA Max | Excerpt | Google WC | Joon WC v1 | British WC | British Words | Sentence Count v1 | Sentence Count v2 | Paragraphs | BT Easiness | BT s.e. | Flesch-Reading-Ease | Flesch-Kincaid-Grade-Level | Automated Readability Index | SMOG Readability | New Dale-Chall Readability Formula | CAREC | CAREC_M | CARES | CML2RI | Kaggle split | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4716 | 8024.0 | wikijunior | Introduction to The Elements | wikibooks | 2013.0 | Info | start | G | The whole universe is built of matter. Right now, you are surrounded by it. The air we breathe is matter, and all the things you see around you are matter. The odors you smell are matter and the sounds you hear are caused by the movement of matter in your ears.\nMatter is everything that takes up space and has weight. Scientists say that matter has volume and mass. Matter is made up of tiny building blocks called atoms. The purest type of atom is called an element. The elements are what give matter its different qualities.\nToday we can see atoms by using a special instrument called an electron microscope. An electron microscope lets us see things that are millions of times smaller than the things we can see with a powerful optical microscope.\nMost of the matter around us has more than one element in it. But some matter is made up of just one element. If you have ever held a diamond, for example, it is made of just one element, Carbon. | 172.0 | 175.0 | 0.0 | NaN | 14.0 | 14.0 | 4.0 | 0.650829 | 0.544809 | 75.71 | 5.80 | 5.20 | 9.0 | 6.78 | 0.22018 | 0.21327 | 0.409172 | 26.844970 | Test |
| 4717 | 8025.0 | wikijunior | Solid Basics | wikibooks | 2019.0 | Info | start | G | So what is a solid? Solids are usually hard because their molecules have been packed together. The closer your molecules are, the harder you are. Solids also can hold their own shape. A rock will always look like a rock unless something happens to it. The same goes for a diamond. Even when you grind up a solid into a powder, you will see tiny little pieces of that solid under a microscope. Liquids will move and fill up any container. Solids keep their shape.\nIn the same way that a solid holds its shape, the atoms inside of a solid are not allowed to move around too much. This is one of the physical characteristics of solids. Atoms and molecules in liquids and gases are bouncing and floating around, free to move where they want. The molecules in a solid are stuck in place. The atoms still spin and the electrons will still fly around, but the entire atom will not change position. | 163.0 | 164.0 | 0.0 | NaN | 14.0 | 14.0 | 2.0 | 0.189476 | 0.535648 | 76.30 | 5.53 | 4.85 | 8.0 | 7.01 | 0.22618 | 0.22618 | 0.415339 | 20.641329 | Train |
| 4718 | 8026.0 | wikijunior | Liquid Basics | wikibooks | 2020.0 | Info | start | G | The second state of matter we will discuss is a liquid. Solids are hard things you can hold. Gases are floating around you and in bubbles. What is a liquid? Water is a liquid. Your blood is a liquid. Liquids are an in-between state of matter. They can be found in between the solid and gas states. They don't have to be made up of the same compounds. If you have a variety of materials in a liquid, it is called a solution.\nOne characteristic of a liquid is that it will fill up the shape of a container. If you pour some water in a cup, it will fill up the bottom of the cup first and then fill the rest. The water will also take the shape of the cup. It fills the bottom first because of gravity. The top part of a liquid will usually have a flat surface. That flat surface is because of gravity too. Putting an ice cube (solid) into a cup will leave you with a cube in the middle of the cup; the shape won't change until the ice becomes a liquid. | 189.0 | 190.0 | 0.0 | NaN | 17.0 | 17.0 | 2.0 | 0.255209 | 0.483866 | 86.03 | 4.05 | 2.40 | 7.0 | 6.43 | 0.15290 | 0.14475 | 0.468837 | 26.533802 | Train |
| 4719 | 8027.0 | wikijunior | Bugs/Monarch butterfly | wikibooks | 2019.0 | Info | start | G | The name Monarch means “king”. An adult Monarch Butterfly is about 1 ½ inches long. Its body is black with white markings. There are white spots on the head and around the wing edges. The wings are bright orange with black veins. The undersides of the wings are light orange. Male Monarchs have a black spot on the back of each hind wing.\nWings have 2 parts: a forewing and a hind wing. The wing span can be up to 4 inches across. The back edges of the wings are called “margins”. They bend to push air backward and move the butterfly forward. The stiff front edges of the wings lift the butterfly in flight. Black veins create a framework that keeps the wings stable. Female wing veins are thicker than those of males.\nMonarch Butterflies come from yellow, black, and white striped caterpillars. Monarch caterpillars grow to about 2 inches in length. They have 2 tentacles that look like antennae at the front of the body, and 2 tentacles at the back. | 171.0 | 173.0 | 0.0 | NaN | 17.0 | 17.0 | 3.0 | 0.423388 | 0.511439 | 87.37 | 3.59 | 4.37 | 7.0 | 6.71 | 0.13576 | 0.12908 | 0.442461 | 19.271483 | Test |
| 4720 | 8028.0 | wikijunior | Bugs/Walking Stick | wikibooks | 2020.0 | Info | start | G | Walking Sticks are long, thin, and slow-moving bugs, that looks like a stick, twig or branch. They are also called walking sticks. Males tend to be smaller than females. The colors are usually brown or green, but may be grey or shades of red. Also some are shaded orange, but in little places. Stripes, spots, and speckles are more common than solid. Males usually have wings, but females are most likely wingless. Short, tough forewings protect the larger fan-shaped hind wings.\nThe common American Walking Stick is slender and shiny with long antennas. The adult male is 2 to 3 inches long with bands of color,while the adult female is 4 to 5 inches long.\nThe New Guinea Spiny Stick Insect is big and bulky. It can grow to 4-1/2 inches to 6 inches long. It resembles a branch more than a slender stick. The colors are dark brown to black. Their legs are thick and prickly. Adult males have a long thorn on each hind leg. Nymphs, another type new type of walking stick, have green-and-brown patterns. | 176.0 | 178.0 | 0.0 | NaN | 17.0 | 17.0 | 3.0 | -0.614142 | 0.475506 | 85.42 | 4.02 | 4.32 | 7.0 | 7.62 | 0.08258 | 0.05378 | 0.468005 | 15.814468 | Test |
| 4721 | 8029.0 | wikijunior | Bugs/Black Widow | wikibooks | 2020.0 | Info | start | G | A Black Widow is a shiny black spider. It has an orange or red mark that looks like an hourglass. Its abdomen is shaped like a sphere and has an hourglass mark on the bottom. Often there are just two red marks separated by black. Females sometimes have the hourglass shape on top of the abdomen above the silk-spinning organs (spinnerets). Females are usually about 1-1/2 inches long including their leg span. In areas where grapes grow, females are very small and round. They resemble shiny black or red grapes.\nMale Black Widows are much smaller than females. Their bodies are only about 1/4 inch long. They can be either gray or black. They do not have an hourglass mark, but may have red spots on the abdomen.\nBlack widows are sometimes called “comb-footed” spiders. The bristles on their hind legs are used to cover trapped prey with silk.\nYoung spiders are called “spiderlings”. They shed their outer covering (exoskeleton) as they grow. Spiderlings are orange, brown, or white at first and get darker each time they shed their skin (molt). | 178.0 | 181.0 | 0.0 | NaN | 17.0 | 17.0 | 4.0 | 0.310336 | 0.508939 | 81.36 | 4.60 | 5.32 | 8.0 | 6.92 | 0.10992 | 0.08300 | 0.486970 | 22.731214 | Test |
| 4722 | 8030.0 | wikijunior | Solids | wikibooks | 2014.0 | Info | start | G | Solids are shapes that you can actually touch. They have three dimensions, which means that the have length, width and height. These shapes are what make up our daily life, and are very useful. Points on a solid must not be coplanar or colinear. The edge of solids are called the edge, and the surfaces are called faces. The corners, like angles and plane figures, are called vertices.\nA solid with only straight edges is called a polyhedron(pol-ee-HEE-dron). The plural form of polehedron is polyhedra(pol-ee-HEE-drah). Your chocolate bars are polyhedra, The Great Pyramids are polyhedra – a lot of things are. We will go into detail about them later.\nWhen dealing with these solid figures, there are two measurements we will need to know: the total surface area and the volume. The former is the sum of the faces of the solid; the latter is how big the solid is. | 148.0 | 150.0 | 0.0 | NaN | 12.0 | 12.0 | 3.0 | -0.215279 | 0.514128 | 75.83 | 5.89 | 5.84 | 9.0 | 7.74 | 0.18951 | 0.19583 | 0.381914 | 16.386932 | Train |
| 4723 | 8031.0 | wikijunior | Anials | wikibooks | 2018.0 | Info | start | G | Animals are made of many cells. They eat things and digest them inside. Most animals can move. Only animals have brains (though not even all animals do; jellyfish, for example, do not have brains).\nAnimals are found all over the earth. They dig in the ground, swim in the oceans, and fly in the sky.\nHumans are a type of animal. So are dogs, cats, cows, horses, frogs, fish, and so on and on.\nAnimals can be divided into two main groups, vertebrates and invertebrates. Vertebrates can be further divided into mammals, fish, birds, reptiles, and amphibians. Invertebrates can be divided into arthropods (like insects, spiders, and crabs), mollusks, sponges, several different kinds of worms, jellyfish — and quite a few other subgroups. There are at least thirty kinds of invertebrates, compared to the five kinds of vertebrates. Vertebrates have a backbone, while invertebrates do not. | 143.0 | 146.0 | 0.0 | NaN | 13.0 | 13.0 | 4.0 | 0.300779 | 0.512379 | 63.07 | 7.23 | 6.91 | 10.0 | 6.80 | 0.20880 | 0.20880 | 0.495853 | 14.830202 | Train |
| 4724 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4725 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Most frequently occurring
| ID | Author | Title | Source | Pub Year | Category | Location | MPAA Max | Excerpt | Google WC | Joon WC v1 | British WC | British Words | Sentence Count v1 | Sentence Count v2 | Paragraphs | BT Easiness | BT s.e. | Flesch-Reading-Ease | Flesch-Kincaid-Grade-Level | Automated Readability Index | SMOG Readability | New Dale-Chall Readability Formula | CAREC | CAREC_M | CARES | CML2RI | Kaggle split | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 |